Presenter: Tony Liang
October 31, 2024
Circulating free DNA (cfDNA) are DNA fragments released into bloodstream
Fraction of cfDNA could be released from cancer or tumor cells are circulating-tumor DNA (ctDNA)
Contains genetic and epigenetic changes, and could reveal the cells from which is originated
Current cfDNA screening test can detect presence of abnormal signals but cannot tell tumor’s origin or cancer type or tissue of origin (TOO)
Existing methods limiations
Authors1 came up with their reference-based deconvolution method…
In some sense, “combining” existing methodology like nonnegative least squares, matrix factorization etc.
\[ f(A) \quad = \quad \sum\limits_{i=1}^n \sum\limits_{k=1}^p \quad W_{ik} \quad \Big| \underbrace{R_{ik}^{\text{(cfdna)}}}_{(1)} - \underbrace{\sum\limits_{j=1}^m A_{ij} B_{jk}}_{(2)}\Big| \]
Some math behind how MetDecode address unknown cell type contributor
To account for \(h\) unknown contributors in cfDNA mixture by adding \(h\) extra rows to \(R^{\text{(atlas)}}\)
\[ R_{hk}^{\text{(atlas)}} = \begin{cases} R_k^{lb}, \quad e_k > 0 \\ R_k^{ub}, \quad otherwise \end{cases} \quad \text{where} \quad e_k = \text{median}_i \quad \Big( -R_{ik}^{(cfdna)} + \sum\limits_{j} \alpha_{ij} R_{jk}^{(\text{atlas})} \Big) \]
Pearson Correlation Coefficient \(\rho\) and Mean Squared Error (MSE) to evaluate MetDecode estimations
Accuracy to evaluate multiclass cancer TOO prediction, and Cohen’s kappa to adjust for multiclass nature of the problem
Some notations
\[ MSE = \quad \frac{1}{n} \sum\limits_{i=1}^{n} \quad (Y_i - \hat{Y_i})^2 \]
\[ \begin{align*} \kappa &= \frac{(p_o - p_e)}{(1 - p_e)}, \quad p_e = \frac{1}{N^2} \sum\limits_{k=1}^K n_{k1} n_{k2} \end{align*} \]
where \(n_{k1}\) is number of times label \(k\) appears in predictions, and \(n_{k2}\) is number of times label \(k\) is a true label1
Ran on 50 simulation runs, each containing \(5000\) simulated cfDNA samples.
Then computed Pearson Correlation Coefficient of different deconvolution algorithms
Upon averaging all correlation coefficients, MetDecode was significantly higher than all other approaches
Deconvolution of genomic DNA methylation profiles
High correlation when comparing complete blood counting and MetDecode deconvolution estimates
MetDecode without unknown contributor outperformed NNLS in terms of average pearson correlation and MSE
MetDecode with 1 unknown contributor performs best based on Cohen’kappa
All methods perform equally poor for \(< 50\%\) accuracy when predicting all samples
Closer performance when looking at those \(19\) samples with tumor fraction \(> 3\%\)1
How could one utilize cfDNA?
cfDNA epigenetic signatures can be used to deduce TOO or cancer type
MetDecode is an algorithm that estimates contributions and type of cancer in cfDNA sample
It models unknown contributors not present in the reference atlas
And accounts for coverage of each marker region to alleviate potential sources of noise
Why weighting approach only improves deconvolution accuracy on cancer components only and not in blood cell types?
Why sometimes adding extra unknown contributor yields better result and sometimes not?
Cell type deconvolution still seems hard (low accuracy in terms of predicting cancer type), what is the next step?
Aside, can you always just combined existing approach to get a “new” method out?